MixApart: Decoupled Analytics for Shared Storage Systems

نویسندگان

  • Madalin Mihailescu
  • Gokul Soundararajan
  • Cristiana Amza
چکیده

Data analytics and enterprise applications have very different storage functionality requirements. For this reason, enterprise deployments of data analytics are on a separate storage silo. This may generate additional costs and inefficiencies in data management, e.g., whenever data needs to be archived, copied, or migrated across silos. We introduce MixApart, a scalable data processing framework for shared enterprise storage systems. With MixApart, a single consolidated storage back-end manages enterprise data and services all types of workloads, thereby lowering hardware costs and simplifying data management. In addition, MixApart enables the local storage performance required by analytics through an integrated data caching and scheduling solution. Our preliminary evaluation shows that MixApart can be 45% faster than the traditional ingest-then-compute workflow used in enterprise IT analytics, while requiring one third of storage capacity when compared to HDFS.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Architecture for Hadoop Distributed File Systems

The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economica...

متن کامل

A New Non-linear Control of the Four-Leg Inverter with Decoupled Model and Fast Dynamic Response for PV Generation Systems

Distributed generation (DG) will play an important role in future power generation systems, especially in stand-alone applications. Three phase four-leg inverter is a well-known topology which can be used as an interface power converter for DGs. Thanks to the fourth leg to provide the neutral path, the four-leg inverter is able to supply balanced loads as well as unbalanced loads. In this paper...

متن کامل

Decoupled Interconnection of Distributed Memory Models

In this paper we present a framework to formally describe and study the interconnection of distributed shared memory systems. In our models we minimize the dependencies between the original systems and the interconnection system (that is, they are decoupled) and consider systems implemented with invalidation and propagation. We first show that only fast (i.e. wait-free) memory models can be int...

متن کامل

SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures

SQL query processing for analytics over Hadoop data has recently gained significant traction. Among many systems providing some SQL support over Hadoop, Hive is the first native Hadoop system that uses an underlying framework such as MapReduce or Tez to process SQL-like statements. Impala, on the other hand, represents the new emerging class of SQL-on-Hadoop systems that exploit a shared-nothin...

متن کامل

Societal Needs, Shared-Value Models, Performance Indicators, Big Data, Business Analytics Models and Tools

In Chapter 1, The CAM framework focused on the development of innovative social business models through the usage of frontier data envelopment analysis to measure shared value for a sustainable growth of an organization. Chapter 2 first discusses the causes and effects of societal challenges and how shared value models can alleviate them. Second, successful technology and non-technology innovat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012